Take-home_Ex01

Published

February 5, 2024

1. Objective

We will be applying appropriate spatial point patterns analysis methods learned in class to discover the geographical and spatio-temporal distribution of Grab hailing services locations in Singapore.

2. Getting Started

2.1 Loading R packages

The R packages that we will be using in this exercise are as follows:

  • arrow: For reading parquet files (Grab-Posisi Dataset)

  • lubridate: To handle the date formatting

  • sf: Import, manage and process vector-based geospatial data in R.

  • tidyverse: a collection of packages for data science tasks

  • spatstat: Wide range of useful functions for point pattern analysis and derive kernel density estimation (KDE) layer.

  • spNetwork: provides functions to perform Spatial Point Patterns Analysis such as kernel density estimation (KDE) and K-function on network. It also can be used to build spatial matrices (‘listw’ objects like in ‘spdep’ package) to conduct any kind of traditional spatial analysis with spatial weights based on reticular distances.

  • tmap: Provides functions for plotting cartographic quality static point patterns maps or interactive maps by using leaflet API.

  • raster: reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster). In this hands-on exercise, it will be used to convert image output generate by spatstat into raster format.

  • maptools: Provides a set of tools for manipulating geographic data. In this take-home exercise, we mainly use it to convert Spatial objects into ppp format of spatstat.

  • # classInt, viridis, rgdal

Show code
pacman::p_load(arrow, lubridate, sf, tidyverse, spNetwork, tmap, 
               spatstat, raster, maptools)

2.2 Importing the datasets

The datasets that we will be using are as follow:

Using read_parquet() function from arrow package to import the grab data, then changing pingtimestamp column to datetime object

Show code
grab_df0 <- read_parquet("data/aspatial/part-00000.snappy.parquet")
grab_df1 <- read_parquet("data/aspatial/part-00001.snappy.parquet")
grab_df2 <- read_parquet("data/aspatial/part-00002.snappy.parquet")
grab_df3 <- read_parquet("data/aspatial/part-00003.snappy.parquet")
grab_df4 <- read_parquet("data/aspatial/part-00004.snappy.parquet")
grab_df5 <- read_parquet("data/aspatial/part-00005.snappy.parquet")
grab_df6 <- read_parquet("data/aspatial/part-00006.snappy.parquet")
grab_df7 <- read_parquet("data/aspatial/part-00007.snappy.parquet")
grab_df8 <- read_parquet("data/aspatial/part-00008.snappy.parquet")
grab_df9 <- read_parquet("data/aspatial/part-00009.snappy.parquet")
Show code
grab_df <- bind_rows(grab_df0,
                     grab_df1,
                     grab_df2,
                     grab_df3,
                     grab_df4,
                     grab_df5,
                     grab_df6,
                     grab_df7,
                     grab_df8,
                     grab_df9)

grab_df$pingtimestamp <- as_datetime(grab_df$pingtimestamp)

Transforming the coordinate system at the same time when we are importing the data

Show code
sg_roads <- st_read(dsn = "data/geospatial", layer = "gis_osm_roads_free_1") %>% st_transform(crs = 3414)
Reading layer `gis_osm_roads_free_1' from data source 
  `/Users/jacksontan/Documents/Sashimii0219/IS415-GAA/Take-home_Ex/Take-home_Ex01/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1759836 features and 10 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 99.66041 ymin: 0.8021131 xmax: 119.2601 ymax: 7.514393
Geodetic CRS:  WGS 84

Transforming the coordinate system at the same time when we are importing the data

Show code
mpsz2019 <- st_read("data/geospatial", layer = "MPSZ-2019") %>% st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `/Users/jacksontan/Documents/Sashimii0219/IS415-GAA/Take-home_Ex/Take-home_Ex01/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

3. Geospatial Data Wrangling

Before we begin exploring the data, we will first need to perform some data pre-processing on the datasets that we have imported.

3.1 Data Pre-processing - MPSZ2019

3.1.1 Excluding Outer Islands

As grab won’t be able to reach offshore places, we will exclude the outer islands from this dataset. We will do this through the following steps:

We will first take a look at the unique planning areas in Singapore using unique() on the PLN_AREA_N column of mpsz2019 dataset.

Show code
unique(mpsz2019$PLN_AREA_N)
 [1] "MARINA EAST"             "RIVER VALLEY"           
 [3] "SINGAPORE RIVER"         "WESTERN ISLANDS"        
 [5] "MUSEUM"                  "MARINE PARADE"          
 [7] "SOUTHERN ISLANDS"        "BUKIT MERAH"            
 [9] "DOWNTOWN CORE"           "STRAITS VIEW"           
[11] "QUEENSTOWN"              "OUTRAM"                 
[13] "MARINA SOUTH"            "ROCHOR"                 
[15] "KALLANG"                 "TANGLIN"                
[17] "NEWTON"                  "CLEMENTI"               
[19] "BEDOK"                   "PIONEER"                
[21] "JURONG EAST"             "ORCHARD"                
[23] "GEYLANG"                 "BOON LAY"               
[25] "BUKIT TIMAH"             "NOVENA"                 
[27] "TOA PAYOH"               "TUAS"                   
[29] "JURONG WEST"             "SERANGOON"              
[31] "BISHAN"                  "TAMPINES"               
[33] "BUKIT BATOK"             "HOUGANG"                
[35] "CHANGI BAY"              "PAYA LEBAR"             
[37] "ANG MO KIO"              "PASIR RIS"              
[39] "BUKIT PANJANG"           "TENGAH"                 
[41] "SELETAR"                 "SUNGEI KADUT"           
[43] "YISHUN"                  "MANDAI"                 
[45] "PUNGGOL"                 "CHOA CHU KANG"          
[47] "SENGKANG"                "CHANGI"                 
[49] "CENTRAL WATER CATCHMENT" "SEMBAWANG"              
[51] "WESTERN WATER CATCHMENT" "WOODLANDS"              
[53] "NORTH-EASTERN ISLANDS"   "SIMPANG"                
[55] "LIM CHU KANG"           
Show code
plot(mpsz2019)

Note that there are 3 areas with island in their name, mainly “NORTH-EASTERN ISLANDS”, “SOUTHERN ISLANDS”, and “WESTERN ISLANDS”.

To exclude the islands, we simply have to pass a condition to exclude these islands in the subset function.

Show code
mpsz2019_new <- subset(mpsz2019, !(PLN_AREA_N %in% 
            c("NORTH-EASTERN ISLANDS", "SOUTHERN ISLANDS", "WESTERN ISLANDS")))

Great! Now let’s check if we indeed removed the maps!

Show code
tmap_mode('plot')
before <- tm_shape(mpsz2019) +
  tm_polygons("PLN_AREA_N") +
  tmap_options(max.categories = 53)
after <- tm_shape(mpsz2019_new) +
  tm_polygons("PLN_AREA_N") +
  tmap_options(max.categories = 53)

tmap_arrange(before, after)

3.1.2 Invalid Geometries

We will be using the st_is_valid() function to test for invalid geometries.

Show code
test <- st_is_valid(mpsz2019_new,reason=TRUE)

# Number of invalid geometries
length(which(test!= "Valid Geometry"))
[1] 3
Show code
# Reason
test[which(test!= "Valid Geometry")]
[1] "Ring Self-intersection[26922.5243000389 27027.610899987]" 
[2] "Ring Self-intersection[38991.2589000446 31986.5599999869]"
[3] "Ring Self-intersection[14484.6860000313 31330.1319999856]"

We can see that there are 3 invalid geometries. Let’s fix them using st_make_valid().

Show code
mpsz2019_new<- st_make_valid(mpsz2019_new)
length(which(st_is_valid(mpsz2019_new) == FALSE))
[1] 0

3.1.3 Missing Values

Show code
mpsz2019_new[rowSums(is.na(mpsz2019_new))!=0,]
Simple feature collection with 0 features and 6 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] SUBZONE_N  SUBZONE_C  PLN_AREA_N PLN_AREA_C REGION_N   REGION_C   geometry  
<0 rows> (or 0-length row.names)

Using the code above, we can see that there are no missing values.

3.1.4 Creating boundary?

Show code
sg_boundary <- mpsz2019_new %>% st_union()
plot(sg_boundary)

3.2 Data Pre-processing - OpenStreetMap Road Dataset

3.2.1 Limiting the dataset

As the dataset contains data from Malaysia and Brunei as well, we will use st_intersection() to limit the data to only Singapore.

Show code
points_within_sg <- st_intersection(sg_roads, mpsz2019_new)

Now, we can see that in points_within_sg it only contain Singapore road data, combined with the other values from mpsz2019 like “PLN_AREA_N” used above.

Show code
colnames(points_within_sg)
 [1] "osm_id"     "code"       "fclass"     "name"       "ref"       
 [6] "oneway"     "maxspeed"   "layer"      "bridge"     "tunnel"    
[11] "SUBZONE_N"  "SUBZONE_C"  "PLN_AREA_N" "PLN_AREA_C" "REGION_N"  
[16] "REGION_C"   "geometry"  
Show code
head(points_within_sg)
Simple feature collection with 6 features and 16 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 31466.72 ymin: 30680.54 xmax: 32815.21 ymax: 30873.74
Projected CRS: SVY21 / Singapore TM
        osm_id code        fclass               name  ref oneway maxspeed layer
4052  23946437 5122   residential          Rhu Cross <NA>      F       50     0
9668  32605139 5131 motorway_link               <NA> <NA>      F       40     0
20076 46337834 5131 motorway_link               <NA> <NA>      F       50    -2
21690 49961799 5111      motorway East Coast Parkway  ECP      F       70     1
26543 74722808 5111      motorway East Coast Parkway  ECP      F       70     1
29808 99007260 5131 motorway_link               <NA> <NA>      F       50     1
      bridge tunnel   SUBZONE_N SUBZONE_C  PLN_AREA_N PLN_AREA_C       REGION_N
4052       F      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
9668       F      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
20076      F      T MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
21690      F      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
26543      T      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
29808      T      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
      REGION_C                       geometry
4052        CR LINESTRING (31889.45 30760....
9668        CR LINESTRING (32768.57 30857....
20076       CR LINESTRING (32815.21 30873....
21690       CR LINESTRING (32365.45 30845....
26543       CR LINESTRING (31611.63 30720....
29808       CR LINESTRING (31611.63 30720....

3.2.2 Invalid Geometries

Again, using the st_is_valid() function to test for invalid geometries.

Show code
test <- st_is_valid(points_within_sg,reason=TRUE)

# Number of invalid geometries
length(which(test!= "Valid Geometry"))
[1] 0
Show code
# Reason
test[which(test!= "Valid Geometry")]
character(0)

No invalid geometries!

3.2.3 Missing Values / Dropping Columns

Show code
points_within_sg[rowSums(is.na(points_within_sg))!=0,]
Simple feature collection with 232766 features and 16 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 2679.373 ymin: 23099.51 xmax: 50957.8 ymax: 50220.06
Projected CRS: SVY21 / Singapore TM
First 10 features:
         osm_id code        fclass           name  ref oneway maxspeed layer
4052   23946437 5122   residential      Rhu Cross <NA>      F       50     0
9668   32605139 5131 motorway_link           <NA> <NA>      F       40     0
20076  46337834 5131 motorway_link           <NA> <NA>      F       50    -2
29808  99007260 5131 motorway_link           <NA> <NA>      F       50     1
45723 140562813 5131 motorway_link           <NA> <NA>      F       70    -1
45728 140562819 5131 motorway_link           <NA> <NA>      F       50     0
45731 140562823 5131 motorway_link           <NA> <NA>      F       60    -2
45733 140562826 5131 motorway_link           <NA> <NA>      F       40     0
52966 150819034 5141       service Bay East Drive <NA>      B        0     0
84664 174717984 5153       footway           <NA> <NA>      B        0     0
      bridge tunnel   SUBZONE_N SUBZONE_C  PLN_AREA_N PLN_AREA_C       REGION_N
4052       F      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
9668       F      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
20076      F      T MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
29808      T      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
45723      F      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
45728      F      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
45731      F      T MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
45733      F      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
52966      F      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
84664      F      F MARINA EAST    MESZ01 MARINA EAST         ME CENTRAL REGION
      REGION_C                       geometry
4052        CR LINESTRING (31889.45 30760....
9668        CR LINESTRING (32768.57 30857....
20076       CR LINESTRING (32815.21 30873....
29808       CR LINESTRING (31611.63 30720....
45723       CR LINESTRING (32782.42 30754....
45728       CR LINESTRING (32645.37 30683....
45731       CR LINESTRING (32809.68 30108....
45733       CR LINESTRING (32609.11 30700....
52966       CR LINESTRING (32173.46 30036....
84664       CR LINESTRING (31750.06 30644....

By using the code above, we can see that majority of the missing values are in the ‘name’ and ‘ref’ column. Therefore, let’s drop the irrelevant columns first before we try it again!

Show code
sg_roads_new <- points_within_sg[c("osm_id", "code", "fclass", "PLN_AREA_N", "geometry")]

We only kept “osm_id”, “code”, “fclass”, and “PLN_AREA_N” columns.

Show code
sg_roads_new[rowSums(is.na(sg_roads_new))!=0,]
Simple feature collection with 0 features and 4 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] osm_id     code       fclass     PLN_AREA_N geometry  
<0 rows> (or 0-length row.names)

No more missing values here.

Our map so far:

Show code
tm_shape(sg_boundary) +
  tm_polygons() +
  tm_shape(sg_roads_new) +
  tm_lines("PLN_AREA_N")

3.3 Data Pre-processing - Grab-Posisi Dataset

The Grab-Posisi Dataset is an Aspatial dataset, different from the two we prepared above. As such, the pre-processing is slightly different too.

3.3.1 Getting the Origin and Destination Locations

The code below is a chain of dplyr pipes to group the trips by their id and extract the first pingtimestamp row of each trip in order to get the origin of it.

Show code
origin_df <- grab_df %>%
  group_by(trj_id) %>%
  arrange(pingtimestamp) %>% 
  filter(row_number()==1) %>% 
  mutate(weekday = wday(pingtimestamp,
                        label=TRUE,
                        abbr=TRUE),
         start_hr = factor(hour(pingtimestamp)),
         day = factor(mday(pingtimestamp)))
Show code
destination_df <- grab_df %>%
  group_by(trj_id) %>%
  arrange(desc(pingtimestamp)) %>% 
  # Same as previous code but desc, so ending location
  filter(row_number()==1) %>%
  mutate(weekday = wday(pingtimestamp,
                        label=TRUE,
                        abbr=TRUE),
         end_hr = factor(hour(pingtimestamp)),
         day = factor(mday(pingtimestamp)))

3.3.2 Converting to SF format from Dataframe

We will need the files in SF format first before we can use it for further geospatial analysis.

Show code
origin_sf <- st_as_sf(origin_df, 
                       coords = c("rawlng", "rawlat"),
                       crs=4326) %>%
  st_transform(crs = 3414)

dest_sf <- st_as_sf(destination_df, 
                       coords = c("rawlng", "rawlat"),
                       crs=4326) %>%
  st_transform(crs = 3414)

3.3.3 Invalid Geometries

Show code
test <- st_is_valid(origin_sf,reason=TRUE)
length(which(test!= "Valid Geometry"))
[1] 0
Show code
test <- st_is_valid(dest_sf,reason=TRUE)
length(which(test!= "Valid Geometry"))
[1] 0

3.3.4 Missing Files

Show code
origin_sf[rowSums(is.na(origin_sf))!=0,]
Simple feature collection with 0 features and 10 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
# A tibble: 0 × 11
# Groups:   trj_id [0]
# ℹ 11 variables: trj_id <chr>, driving_mode <chr>, osname <chr>,
#   pingtimestamp <dttm>, speed <dbl>, bearing <int>, accuracy <dbl>,
#   weekday <ord>, start_hr <fct>, day <fct>, geometry <GEOMETRY [m]>
Show code
dest_sf[rowSums(is.na(dest_sf))!=0,]
Simple feature collection with 0 features and 10 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
# A tibble: 0 × 11
# Groups:   trj_id [0]
# ℹ 11 variables: trj_id <chr>, driving_mode <chr>, osname <chr>,
#   pingtimestamp <dttm>, speed <dbl>, bearing <int>, accuracy <dbl>,
#   weekday <ord>, end_hr <fct>, day <fct>, geometry <GEOMETRY [m]>

No missing values, we are almost ready.

3.3.5 Removing points on the islands

Show code
origin_sf_new <- st_intersection(origin_sf, mpsz2019_new)
dest_sf_new <- st_intersection(dest_sf, mpsz2019_new)

To verify that the points that we removed is indeed from the islands, here’s a chunk of code to prove:

Show code
# Finding out points removed
diff_id <- origin_sf$trj_id[!(origin_sf$trj_id %in% origin_sf_new$trj_id)]

# Extracting full information of these points
outliers <- origin_sf[(origin_sf$trj_id %in% diff_id), ]

# Checking where these places are from
unique(st_intersection(outliers, mpsz2019)$PLN_AREA_N)
[1] "WESTERN ISLANDS"  "SOUTHERN ISLANDS"

They are indeed from “WESTERN ISLANDS” and “SOUTHERN ISLANDS”.

3.3.6 Dropping Unnecessary Columns

Now that our grab dataset is almost ready, we need to decide which column we should drop. Here are the columns in both origin_sf_new and dest_sf_new:

Show code
colnames(origin_sf_new)
 [1] "trj_id"        "driving_mode"  "osname"        "pingtimestamp"
 [5] "speed"         "bearing"       "accuracy"      "weekday"      
 [9] "start_hr"      "day"           "SUBZONE_N"     "SUBZONE_C"    
[13] "PLN_AREA_N"    "PLN_AREA_C"    "REGION_N"      "REGION_C"     
[17] "geometry"     
Show code
colnames(dest_sf_new)
 [1] "trj_id"        "driving_mode"  "osname"        "pingtimestamp"
 [5] "speed"         "bearing"       "accuracy"      "weekday"      
 [9] "end_hr"        "day"           "SUBZONE_N"     "SUBZONE_C"    
[13] "PLN_AREA_N"    "PLN_AREA_C"    "REGION_N"      "REGION_C"     
[17] "geometry"     

We will definitely be dropping the columns merged from mpsz2019_new (other than PLN_AREA_N), but what about “driving_mode”, “osname”, “speed”, “bearing”, and “accuracy”? Let’s first take a look at them.

Show code
unique(origin_sf_new$driving_mode)
[1] "car"

Seeing that there is only 1 constant in the column, it is safe for us to drop this column.

Show code
unique(origin_sf_new$osname)
[1] "ios"     "android"

There are 2 values, mainly “ios” and “android”. Arguments can be made that we can analyse the behavior of both type in terms of using grab hailing services, but that’s not what we will doing so we will drop this as well.

As we are analysing start/stop points, speed will not be a relevant factor hence we will be dropping them.

Not relevant as well, hence dropping.

According to research paper published on Grab website, this is the definition of the accuracy column:

“…the accuracy level roughly indicates the radius of the circle within which the true location lies with a certain probability. The lower the accuracy level, the more precise the reported GPS ping is.”

With that, let’s take a look at the distribution of accuracy score.

Show code
plot(origin_sf_new$accuracy)

Show code
ggplot(origin_sf_new, 
       aes(x=rownames(origin_sf_new), y=accuracy)) + 
  geom_point(size = 2)

Show code
ggplot(dest_sf_new, 
       aes(x=rownames(dest_sf_new), y=accuracy)) + 
  geom_point(size = 2)

From the plot, we can see that there are 3 clear outliers with accuracy above 180~ for origin_sf_new, and 1 for dest_sf_new. Now let’s extract these trips.

Show code
origin_sf_new[origin_sf_new$accuracy > 180, ]
Simple feature collection with 1 feature and 16 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 29008.44 ymin: 32353.78 xmax: 29008.44 ymax: 32353.78
Projected CRS: SVY21 / Singapore TM
# A tibble: 1 × 17
  trj_id driving_mode osname pingtimestamp       speed bearing accuracy weekday
  <chr>  <chr>        <chr>  <dttm>              <dbl>   <int>    <dbl> <ord>  
1 59560  car          ios    2019-04-16 00:28:59    -1      26      728 Tue    
# ℹ 9 more variables: start_hr <fct>, day <fct>, SUBZONE_N <chr>,
#   SUBZONE_C <chr>, PLN_AREA_N <chr>, PLN_AREA_C <chr>, REGION_N <chr>,
#   REGION_C <chr>, geometry <POINT [m]>
Show code
dest_sf_new[dest_sf_new$accuracy > 500, ]
Simple feature collection with 7 features and 16 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 28983.51 ymin: 29952.1 xmax: 33721.09 ymax: 34502.5
Projected CRS: SVY21 / Singapore TM
# A tibble: 7 × 17
  trj_id driving_mode osname pingtimestamp       speed bearing accuracy weekday
  <chr>  <chr>        <chr>  <dttm>              <dbl>   <int>    <dbl> <ord>  
1 54788  car          ios    2019-04-09 11:06:56    -1     203     1414 Tue    
2 14443  car          ios    2019-04-11 01:09:18    -1     223     1414 Thu    
3 36434  car          ios    2019-04-18 07:41:35    -1     126      806 Thu    
4 60701  car          ios    2019-04-13 03:16:02    -1     130      818 Sat    
5 24103  car          ios    2019-04-11 13:19:30    -1     117     1402 Thu    
6 58922  car          ios    2019-04-13 15:50:44    -1      31     1414 Sat    
7 68340  car          ios    2019-04-12 11:55:48    -1      10     1414 Fri    
# ℹ 9 more variables: end_hr <fct>, day <fct>, SUBZONE_N <chr>,
#   SUBZONE_C <chr>, PLN_AREA_N <chr>, PLN_AREA_C <chr>, REGION_N <chr>,
#   REGION_C <chr>, geometry <POINT [m]>

To ensure that our data is of utmost accuracy, we will drop these trips, before we drop the accuracy column as well (as we will not need it anymore).

Show code
origin_sf_new <- subset(origin_sf_new, accuracy < 180)
dest_sf_new <- subset(dest_sf_new, accuracy < 500)

With that done, we can now drop the columns that we don’t need.

Show code
origin_sf_new <- origin_sf_new[, c(1, 4,  8:10, 13, 17)]
dest_sf_new <- dest_sf_new[, c(1, 4,  8:10, 13, 17)]

3.3.7 Duplicated Points

Lastly, let’s check for duplicated points on the map.

Show code
# Check for any duplicates
any(duplicated(origin_sf_new))
[1] FALSE
Show code
# Count the number of duplicates
sum(multiplicity(origin_sf_new) > 1)
[1] 0

No duplicated points!

3.4 Verifying Coordinate System

It is important for the data to be in the right coordinate reference system (CRS). In this assignment, all spatial data will be projected in EPSG:3414, which is a projected coordinate system for Singapore.

Show code
st_crs(mpsz2019_new)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
Show code
st_crs(sg_roads_new)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
Show code
st_crs(origin_sf_new)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
Show code
st_crs(dest_sf_new)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

They are all in the correct CRS!

3.5 Plotting Spatial Data

Finally, plotting all three datasets together to ensure that they have a consistent projection system.

Show code
tm_shape(sg_boundary) +
  tm_polygons() +
tm_shape(sg_roads_new) + 
  tm_lines("PLN_AREA_N") + 
tm_shape(origin_sf_new) +
  tm_dots()

3.6 Exploratory Data Analysis

Before we begin our Geospatial Analysis, let’s first take a closer look at the Grab dataset.

3.6.1 Day of the Week

The distribution of the trips across all 7 days of the week looks even.

Show code
ggplot(origin_sf_new, aes(x=weekday)) + geom_bar()

Show code
ggplot(dest_sf_new, aes(x=weekday)) + geom_bar()

3.6.2 Planning Area

First let us look at the top 10 planning areas for grab ride origin points. Tampines is the Planning Area with the most origin points.

Show code
origin_pl_area <- origin_sf_new %>%
  group_by(PLN_AREA_N) %>%
  summarise(total_count=n()) %>%
  top_n(10, total_count) %>%
  .$PLN_AREA_N

ggplot(origin_sf_new[origin_sf_new$PLN_AREA_N %in% origin_pl_area,], 
       aes(x=PLN_AREA_N)) + geom_bar() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  labs(title = "Trips Origin Distribution by Planning Area",
       x = "Planning Area",
       y = "Number of Trips")

Then for the destination points.

Show code
dest_pl_area <- dest_sf_new %>%
  group_by(PLN_AREA_N) %>%
  summarise(total_count=n()) %>%
  top_n(10, total_count) %>%
  .$PLN_AREA_N

ggplot(dest_sf_new[dest_sf_new$PLN_AREA_N %in% dest_pl_area,], 
       aes(x=PLN_AREA_N)) + geom_bar() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  labs(title = "Trips Destination Distribution by Planning Area",
       x = "Planning Area",
       y = "Number of Trips")

6 out of 10 of the Planning Areas remains the same for destination points, mainly TAMPINES, WOODLANDS, YISHUN, QUEENSTOWN, BUKIT MERAH, and CHANGI. This time however, the Planning Area with the most destination points is Changi.

3.6.3 Starting Hour

Show code
origin_sf_new$start_hr <- factor(origin_sf_new$start_hr, levels = 0:23)

ggplot(origin_sf_new, aes(x = start_hr)) +
  geom_bar() +
  labs(title = "Trips Distribution by Start Hour",
       x = "Start Hour",
       y = "Number of Trips")

From the graph, we can see that the starting hour peaks at midnight (12am - 1am) and morning (9am - 10am), the former probably due to the lack of public transport after operating hours, and the latter from rush hour.

4. Kernel Density Estimation (KDE) Layers

4.1 Converting data format

4.1.1 Creating point ppp objects

In the code chunk below, as.ppp() function is used to derive a ppp object layer directly from a sf tibble data.frame.

Show code
origin_ppp <- as.ppp(origin_sf_new)
summary(origin_ppp)
Marked planar point pattern:  27871 points
Average intensity 2.631202e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

marks are of type 'character'
Summary:
   Length     Class      Mode 
    27871 character character 

Window: rectangle = [3628.24, 49845.23] x [26770.58, 49689.64] units
                    (46220 x 22920 units)
Window area = 1059250000 square units
Show code
dest_ppp <- as.ppp(dest_sf_new)
summary(dest_ppp)
Marked planar point pattern:  27811 points
Average intensity 2.645533e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

marks are of type 'character'
Summary:
   Length     Class      Mode 
    27811 character character 

Window: rectangle = [3637.21, 49870.63] x [26770.04, 49507.79] units
                    (46230 x 22740 units)
Window area = 1051240000 square units

4.1.2 Creating owin objects

In the code chunk as.owin() is used to create an owin object class from polygon sf tibble data.frame. In this case, we will be converting the sg_boundary polygon.

Show code
sg_boundary_owin <- as.owin(sg_boundary)

4.1.3 Combining point events object and owin object

We will now combine singapore’s boundary and the origin and destination points into one.

Show code
originSG_ppp = origin_ppp[sg_boundary_owin]
destSG_ppp = dest_ppp[sg_boundary_owin]
Show code
plot(destSG_ppp)

Show code
summary(destSG_ppp)
Marked planar point pattern:  27811 points
Average intensity 4.184642e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

marks are of type 'character'
Summary:
   Length     Class      Mode 
    27811 character character 

Window: polygonal boundary
37 separate polygons (29 holes)
                  vertices         area relative.area
polygon 1            12666  6.63014e+08      9.98e-01
polygon 2              285  1.61128e+06      2.42e-03
polygon 3               27  1.50315e+04      2.26e-05
polygon 4 (hole)        41 -4.01660e+04     -6.04e-05
polygon 5 (hole)       317 -5.11280e+04     -7.69e-05
polygon 6 (hole)         3 -4.14099e-04     -6.23e-13
polygon 7               30  2.80002e+04      4.21e-05
polygon 8 (hole)         4 -2.86396e-01     -4.31e-10
polygon 9 (hole)         3 -1.81439e-04     -2.73e-13
polygon 10 (hole)        3 -8.68789e-04     -1.31e-12
polygon 11 (hole)        3 -5.99535e-04     -9.02e-13
polygon 12 (hole)        3 -3.04561e-04     -4.58e-13
polygon 13 (hole)        3 -4.46076e-04     -6.71e-13
polygon 14 (hole)        3 -3.39794e-04     -5.11e-13
polygon 15 (hole)        3 -4.52043e-05     -6.80e-14
polygon 16 (hole)        3 -3.90173e-05     -5.87e-14
polygon 17 (hole)        3 -9.59850e-05     -1.44e-13
polygon 18 (hole)        4 -2.54488e-04     -3.83e-13
polygon 19 (hole)        4 -4.28453e-01     -6.45e-10
polygon 20 (hole)        4 -2.18616e-04     -3.29e-13
polygon 21 (hole)        5 -2.44411e-04     -3.68e-13
polygon 22 (hole)        5 -3.64686e-02     -5.49e-11
polygon 23              71  8.18750e+03      1.23e-05
polygon 24 (hole)        6 -8.37554e-01     -1.26e-09
polygon 25 (hole)       38 -7.79904e+03     -1.17e-05
polygon 26 (hole)        3 -3.41897e-05     -5.14e-14
polygon 27 (hole)        3 -3.65499e-03     -5.50e-12
polygon 28 (hole)        3 -4.95057e-02     -7.45e-11
polygon 29              91  1.49663e+04      2.25e-05
polygon 30 (hole)        5 -2.92235e-04     -4.40e-13
polygon 31 (hole)        3 -7.43616e-06     -1.12e-14
polygon 32 (hole)      270 -1.21455e+03     -1.83e-06
polygon 33 (hole)       19 -4.39650e+00     -6.62e-09
polygon 34 (hole)       35 -1.38385e+02     -2.08e-07
polygon 35 (hole)       23 -1.99656e+01     -3.00e-08
polygon 36              71  5.63061e+03      8.47e-06
polygon 37              10  1.99717e+02      3.01e-07
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
                     (53270 x 28810 units)
Window area = 664597000 square units
Fraction of frame area: 0.433

4.1.4 Rescale

The density values of the output range from 0 to 0.000035 which is way too small to comprehend, and it is computed in “number of points per square meter”. Therefore, we are going to use rescale() to covert the unit of measurement from meter to kilometer.

Show code
originSG_ppp.km <- rescale(originSG_ppp, 1000, "km")
destSG_ppp.km <- rescale(destSG_ppp, 1000, "km")

4.2 Deriving Traditional Kernel Density Estimation (KDE) Layers

4.2.1 Automatic bandwidth selection method

We will first compute the kernel density by using density() of the spatstat package, with the default method bw.diggle().

Show code
kde_originSG_bw <- density(originSG_ppp.km,
                              sigma=bw.ppl,
                              edge=TRUE,
                            kernel="gaussian") 

plot(kde_originSG_bw, main = "Kernel Density Estimation Layer")

Looking at all the different methods, we can see that bw.diggle() is still the best among the automatic bandwidth selection method.

Show code
bw.CvL(originSG_ppp.km)
   sigma 
3.147573 
Show code
bw.scott(originSG_ppp.km)
  sigma.x   sigma.y 
1.5973830 0.9321636 
Show code
bw.ppl(originSG_ppp.km)
    sigma 
0.1238747 
Show code
bw.diggle(originSG_ppp.km)
      sigma 
0.008087202 
Show code
kde_originSG_ppl <- density(originSG_ppp.km, 
                               sigma=bw.ppl, 
                               edge=TRUE,
                               kernel="gaussian")
par(mfrow=c(1,2))
plot(kde_originSG_bw, main = "bw.diggle")
plot(kde_originSG_ppl, main = "bw.ppl")

4.2.2 Computing KDE by using fixed bandwidth

Having tried automatic bandwidth selection method, let’s try computing KDE by using a fixed bandwidth defined by us. In our case, we will define a fixed bandwidth of 800m (or 0.8km).

Show code
kde_originSG_500 <- density(originSG_ppp.km, sigma=0.5, edge=TRUE, kernel="gaussian")
plot(kde_originSG_500)

4.2.3 Computing KDE by using adaptive bandwidth

Fixed bandwidth method, however, is very sensitive to highly skewed distribution of spatial point patterns over geographical units, for example urban versus rural. To overcome this, we can try using adaptive bandwidth instead.

Show code
kde_childcareSG_adaptive <- adaptive.density(originSG_ppp.km, method="kernel")
plot(kde_childcareSG_adaptive)

4.2.4 Method we are using

As the KDE layer using fixed bandwidth with gaussian kernel plots a graph that allows for meaningful analysis at a glance, we will be using that for the steps moving forward.

Show code
kde_originSG_500 <- density(originSG_ppp.km, sigma=0.5, edge=TRUE, kernel="gaussian")
plot(kde_originSG_500)

Show code
kde_destSG_500 <- density(destSG_ppp.km, sigma=0.5, edge=TRUE, kernel="gaussian")
plot(kde_destSG_500)

Show code
par(mfrow=c(1,2))
plot(kde_originSG_500, main = "Origin KDE Layer")
plot(kde_destSG_500, main = "Destination KDE layer")

4.3 Combining KDE layers

4.3.1 Converting KDE layers into grid object

In order for us to map the KDE layer of these points to our map, we first need to convert it into grid object.

Show code
gridded_kde_originSG_500 <- as.SpatialGridDataFrame.im(kde_originSG_500)
spplot(gridded_kde_originSG_500)

Show code
gridded_kde_destSG_500 <- as.SpatialGridDataFrame.im(kde_destSG_500)
spplot(gridded_kde_destSG_500)

4.3.2 Converting KDE layers into grid object

We will then convert the gridded kernel density objects into RasterLayer object by using raster() of raster package. As the RasterLayer object does not include CRS information, we will need to manually assign it to them as well.

Show code
kde_originSG_500_raster <- raster(gridded_kde_originSG_500)
projection(kde_originSG_500_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
kde_originSG_500_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.4162063, 0.2250614  (x, y)
extent     : 2.667538, 55.94194, 21.44847, 50.25633  (xmin, xmax, ymin, ymax)
crs        : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=km +no_defs 
source     : memory
names      : v 
values     : -1.552973e-14, 591.2389  (min, max)
Show code
kde_destSG_500_raster <- raster(gridded_kde_destSG_500)
projection(kde_destSG_500_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
kde_destSG_500_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.4162063, 0.2250614  (x, y)
extent     : 2.667538, 55.94194, 21.44847, 50.25633  (xmin, xmax, ymin, ymax)
crs        : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=km +no_defs 
source     : memory
names      : v 
values     : -1.361913e-14, 526.6182  (min, max)

4.3.3 Overlaying KDE layer on tmap plot

To further explore the map, we will now be overlaying the KDE layer both onto OpenStreetMap of Singapore, and also on the Singapore Planning Area layer and OSM road layer that we have pre-processed.

4.3.3.1 Overlay on OpenStreetMap

Show code
tmap_mode("view")
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(kde_originSG_500_raster) + 
  tm_raster("v", alpha = 0.7,
          palette = "YlOrRd") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE)
Show code
tmap_mode("plot")
Show code
tmap_mode("view")
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(kde_destSG_500_raster) + 
  tm_raster("v", alpha = 0.7,
          palette = "YlOrRd") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE)
Show code
tmap_mode("plot")

As you can see from the plot, there are certain planning areas that are hotspots for hailing of Grab ride service, in particular Central Region (Orchard, Newton etc), Woodlands, Punggol, Tampines, and most notably Changi (where the airport lies).

4.3.3.2 Overlay on Planning Area and OSM Road Layers

To further confirm our observation, let’s plot the KDE layer over our Planning Area and OSM Road Layers.

Show code
tmap_mode("view")
tm_shape(mpsz2019_new) +
  tm_polygons("PLN_AREA_N") +
tm_shape(kde_originSG_500_raster) + 
  tm_raster("v", alpha = 0.7,
          palette = "YlOrRd")
Show code
tmap_mode("view")
tm_shape(mpsz2019_new) +
  tm_polygons("PLN_AREA_N") +
tm_shape(kde_destSG_500_raster) + 
  tm_raster("v", alpha = 0.7,
          palette = "YlOrRd")

The common overlapping Planning Areas include “TAMPINES”, “CHANGI”, “WOODLANDS”, and “NOVENA”, so let’s do a further analysis on these areas.

4.4 In-depth KDE Computation

4.4.1 Data Preparation

To do in-depth KDE computation on these 4 planning areas, we will first need to extract their respective boundaries. In the code below, we extracted their boundaries and converted them to sp’s Spatial* class.

Show code
mpsz <- as_Spatial(mpsz2019_new)
cg = mpsz[mpsz@data$PLN_AREA_N == "CHANGI",]
tp = mpsz[mpsz@data$PLN_AREA_N == "TAMPINES",]
wl = mpsz[mpsz@data$PLN_AREA_N == "WOODLANDS",]
nv = mpsz[mpsz@data$PLN_AREA_N == "NOVENA",]

Plotting down these boundaries.

Show code
par(mfrow=c(2,2))
plot(cg, main = "CHANGI")
plot(tp, main = "TAMPINES")
plot(wl, main = "WOODLANDS")
plot(nv, main = "NOVENA")

Turning the spatial point data frame into generic sp format, then into owin object as done previously.

Show code
cg_sp = as(cg, "SpatialPolygons")
tp_sp = as(tp, "SpatialPolygons")
wl_sp = as(wl, "SpatialPolygons")
nv_sp = as(nv, "SpatialPolygons")

cg_owin = as(cg_sp, "owin")
tp_owin = as(tp_sp, "owin")
wl_owin = as(wl_sp, "owin")
nv_owin = as(nv_sp, "owin")

By using the code below, we will be able to extract grab origin and destination points for these specific areas.

Show code
origin_cg_ppp = origin_ppp[cg_owin]
origin_tp_ppp = origin_ppp[tp_owin]
origin_wl_ppp = origin_ppp[wl_owin]
origin_nv_ppp = origin_ppp[nv_owin]

dest_cg_ppp = dest_ppp[cg_owin]
dest_tp_ppp = dest_ppp[tp_owin]
dest_wl_ppp = dest_ppp[wl_owin]
dest_nv_ppp = dest_ppp[nv_owin]

Next up is the rescale() function used previously as well.

Show code
origin_cg_ppp.km = rescale(origin_cg_ppp, 1000, "km")
origin_tp_ppp.km = rescale(origin_tp_ppp, 1000, "km")
origin_wl_ppp.km = rescale(origin_wl_ppp, 1000, "km")
origin_nv_ppp.km = rescale(origin_nv_ppp, 1000, "km")

dest_cg_ppp.km = rescale(dest_cg_ppp, 1000, "km")
dest_tp_ppp.km = rescale(dest_tp_ppp, 1000, "km")
dest_wl_ppp.km = rescale(dest_wl_ppp, 1000, "km")
dest_nv_ppp.km = rescale(dest_nv_ppp, 1000, "km")

Finally, we plot the four planning areas and the grab hailing origin and destination points

Show code
par(mfrow=c(2,4))
plot(origin_cg_ppp.km, main = "CHANGI ORIGIN")
plot(origin_tp_ppp.km, main = "TAMPINES ORIGIN")
plot(origin_wl_ppp.km, main = "WOODLANDS ORIGIN")
plot(origin_nv_ppp.km, main = "NOVENA ORIGIN")

plot(dest_cg_ppp.km, main = "CHANGI DESTINATION")
plot(dest_tp_ppp.km, main = "TAMPINES DESTINATION")
plot(dest_wl_ppp.km, main = "WOODLANDS DESTINATION")
plot(dest_nv_ppp.km, main = "NOVENA DESTINATION")

4.4.2 Computing KDE

We will now be computing the KDE of each planning area using the fixed bandwidth method.

Show code
par(mfrow=c(1,2))

plot(density(origin_cg_ppp.km, 
             sigma=0.5, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Changi Origin")

plot(density(dest_cg_ppp.km, 
             sigma=0.5, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Changi Destination")

Show code
tmap_mode('plot')
tm_shape(mpsz2019_new[mpsz2019_new$PLN_AREA_N == "CHANGI", ]) + 
  tm_polygons('SUBZONE_N')

The hotspot in Changi area is centered around Changi Airport, indicating a likely surge in use of Grab services due to the constant flow of passengers arriving and departing from Singapore.

Show code
par(mfrow=c(1,2))

plot(density(origin_tp_ppp.km, 
             sigma=0.5, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Tampines Origin")

plot(density(dest_tp_ppp.km, 
             sigma=0.5, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Tampines Destination")

Show code
tmap_mode('plot')
tm_shape(mpsz2019_new[mpsz2019_new$PLN_AREA_N == "TAMPINES", ]) + 
  tm_polygons('SUBZONE_N')

The hotspot in Tampines area is mainly concentrated around the stretch from Tampines West to Tampines East, encompassing the bulk of where most residents of Tampines currently live (Tampines West, Tampines, Tampines East).

Show code
par(mfrow=c(1,2))

plot(density(origin_wl_ppp.km, 
             sigma=0.5, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Woodlands Origin")

plot(density(dest_wl_ppp.km, 
             sigma=0.5, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Woodlands Destination")

Show code
tmap_mode('plot')
tm_shape(mpsz2019_new[mpsz2019_new$PLN_AREA_N == "WOODLANDS", ]) + 
  tm_polygons('SUBZONE_N')

The rides are concentrated around the lower half of Woodlands area, ranging from Woodlands West to Woodlands South, then Woodlands East. However, one prominent hotspot shared across both the origin and destination map is the Woodlands West region, indicating that this might either be the area with the wealthiest residents in Woodlands, or that there are just more residents concentrated here.

Show code
par(mfrow=c(1,2))

plot(density(origin_nv_ppp.km, 
             sigma=0.5, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Novena Origin")

plot(density(dest_nv_ppp.km, 
             sigma=0.5, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Novena Destination")

Show code
tmap_mode('plot')
tm_shape(mpsz2019_new[mpsz2019_new$PLN_AREA_N == "NOVENA", ]) + 
  tm_polygons('SUBZONE_N')

The Novena area’s notable hotspots present an interesting distinction. Origin points predominantly converge around the affluent Moulmein area, revealing a concentration in the wealthier section of town. Conversely, the destination points gravitate towards the Malcolm area, characterized by a cluster of prestigious schools, as illustrated in the figure below.

Moulmein Area

Google Map View of Malcolm Area, characterized by Prestigious Schools

4.5 Nearest Neighbour Analysis

In this section, we will perform the Clark-Evans test of aggregation for a spatial point pattern by using clarkevans.test() of statspat package, to test whether the distribution of Grab ride hailing origin points are randomly distributed.

Using 95% confidence interval, the test hypotheses are:

Ho = The distribution of Grab ride hailing origin points are randomly distributed.

H1= The distribution of Grab ride hailing origin points are not randomly distributed.

For this section, we will be making use of the ppp object.

Clark-Evans Test

Show code
clarkevans.test(originSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  originSG_ppp
R = 0.27981, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
Show code
clarkevans.test(origin_cg_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  origin_cg_ppp
R = 0.11405, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
Show code
clarkevans.test(origin_tp_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  origin_tp_ppp
R = 0.32408, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
Show code
clarkevans.test(origin_wl_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  origin_wl_ppp
R = 0.31779, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
Show code
clarkevans.test(origin_nv_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  origin_nv_ppp
R = 0.34615, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)

Having performed the Clark-Evans Test on all 4 planning area and Singapore as a whole, all of their p-values are <2.2e-16 < 0.05, thus we reject Ho. This means that the distribution of Grab ride hailing origin points are not randomly distributed which we explored in earlier sections.

Furthermore, as their R value ranges from 0.11647 to 0.35838 which is <1, this suggests that the points are clustering.

5. Network Kernel Density Estimation (NKDE) Layers

In this section, we will be using appropriate functions of spNetwork package:

  • to derive network constrained kernel density estimation (NetKDE), and
  • to perform network G-function and k-function analysis,

where in this case the network refers to OSM’s Road Map of Singapore.

However, due to limitations in computational power, we will be limiting the area of scope down to the 4 areas identified in the previous section, Changi, Tampines, Woodlands, and Novena, and only the Origin points.

5.1 Data Preparation

5.1.1 Initial Data Pre-processing

Before we begin, let us first convert our sg_roads_new data from SFC_GEOMETRY to SFC_LINESTRING.

Show code
sg_roads_linestring <- st_cast(sg_roads_new, "LINESTRING")

5.1.2 Narrowing down the scope

Then, let us narrow down the scope of our data to the 4 areas mentioned.

Show code
# Roads
cg_roads <- sg_roads_linestring %>% filter(PLN_AREA_N == "CHANGI")
tp_roads <- sg_roads_linestring %>% filter(PLN_AREA_N == "TAMPINES")
wl_roads <- sg_roads_linestring %>% filter(PLN_AREA_N == "WOODLANDS")
nv_roads <- sg_roads_linestring %>% filter(PLN_AREA_N == "NOVENA")

# Grab Origin Points
cg_origin <- origin_sf_new %>% filter(PLN_AREA_N == "CHANGI")
tp_origin <- origin_sf_new %>% filter(PLN_AREA_N == "TAMPINES")
wl_origin <- origin_sf_new %>% filter(PLN_AREA_N == "WOODLANDS")
nv_origin <- origin_sf_new %>% filter(PLN_AREA_N == "NOVENA")

5.1.3 Visualising the data

Before we begin our analysis, let us visualise our geospatial data to make sure everything falls into place.

Show code
tm_shape(cg_roads) +
  tm_lines("PLN_AREA_N") +
tm_shape(cg_origin) +
  tm_dots()

Show code
tm_shape(tp_roads) +
  tm_lines("PLN_AREA_N") +
tm_shape(tp_origin) +
  tm_dots()

Show code
tm_shape(wl_roads) +
  tm_lines("PLN_AREA_N") +
tm_shape(wl_origin) +
  tm_dots()

Show code
tm_shape(nv_roads) +
  tm_lines("PLN_AREA_N") +
tm_shape(nv_origin) +
  tm_dots()

5.2 Network Constrained KDE (NetKDE) Analysis

We will now perform NetKDE analysis by using appropriate functions provided in spNetwork package.

5.2.1 Preparing the lixels objects

Before we can compute NetKDE, the SpatialLines object need to be cut into lixels with a specified minimal distance, and this can be done using lixelize_lines() of spNetwork package.

Show code
cg_lixels <- lixelize_lines(cg_roads, 
                         700, 
                         mindist = 350)

tp_lixels <- lixelize_lines(tp_roads, 
                         700, 
                         mindist = 350)

wl_lixels <- lixelize_lines(wl_roads, 
                         700, 
                         mindist = 350)

nv_lixels <- lixelize_lines(nv_roads, 
                         700, 
                         mindist = 350)

5.2.2 Generating line centre points

Next, we will use lines_center() of spNetwork to generate a SpatialPointsDataFrame (i.e. samples) with line centre points.

Show code
cg_lines_center <- lines_center(cg_lixels)
tp_lines_center <- lines_center(tp_lixels)
wl_lines_center <- lines_center(wl_lixels)
nv_lines_center <- lines_center(nv_lixels)

5.2.3 Computing NetKDE

We are now ready to compute NetKDE. As the code is fairly long, we will split it into 4 tabs.

Show code
# Origin
cg_o_densities <- nkde(cg_roads, 
                  events = cg_origin,
                  w = rep(1,nrow(cg_origin)),
                  samples = cg_lines_center,
                  kernel_name = "quartic", # kernel method
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  # method used to calculate NKDE. spNetwork supports 3 popular                                     methods, namely simple, discontinuous, and continuous
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 5, 
                  # we aggregate events within a 5m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)
Show code
tp_o_densities <- nkde(tp_roads, 
                  events = tp_origin,
                  w = rep(1,nrow(tp_origin)),
                  samples = tp_lines_center,
                  kernel_name = "quartic", # kernel method
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  # method used to calculate NKDE. spNetwork supports 3 popular                                     methods, namely simple, discontinuous, and continuous
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 5, 
                  # we aggregate events within a 5m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)
Show code
wl_o_densities <- nkde(wl_roads, 
                  events = wl_origin,
                  w = rep(1,nrow(wl_origin)),
                  samples = wl_lines_center,
                  kernel_name = "quartic", # kernel method
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  # method used to calculate NKDE. spNetwork supports 3 popular                                     methods, namely simple, discontinuous, and continuous
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 5, 
                  # we aggregate events within a 5m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)
Show code
nv_o_densities <- nkde(nv_roads, 
                  events = nv_origin,
                  w = rep(1,nrow(nv_origin)),
                  samples = nv_lines_center,
                  kernel_name = "quartic", # kernel method
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  # method used to calculate NKDE. spNetwork supports 3 popular                                     methods, namely simple, discontinuous, and continuous
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 5, 
                  # we aggregate events within a 5m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)

5.2.4 Reinsert Density

Before we are able to visualise, we first need to insert the computed values back into lines_center and lixels objects as density field.

Show code
cg_lines_center$o_density <- cg_o_densities
cg_lixels$o_density <- cg_o_densities

tp_lines_center$o_density <- tp_o_densities
tp_lixels$o_density <- tp_o_densities

wl_lines_center$o_density <- wl_o_densities
wl_lixels$o_density <- wl_o_densities

nv_lines_center$o_density <- nv_o_densities
nv_lixels$o_density <- nv_o_densities

Since svy21 projection system is in meter, the computed density values are very small i.e. 0.0000005. We will thus need to rescale the density values from number of events per meter to number of events per kilometer.

Show code
cg_lines_center$o_density <- cg_lines_center$o_density*1000
cg_lixels$o_density <- cg_lixels$o_density*1000

tp_lines_center$o_density <- tp_lines_center$o_density*1000
tp_lixels$o_density <- tp_lixels$o_density*1000

wl_lines_center$o_density <- wl_lines_center$o_density*1000
wl_lixels$o_density <- wl_lixels$o_density*1000

nv_lines_center$o_density <- nv_lines_center$o_density*1000
nv_lixels$o_density <- nv_lixels$o_density*1000

5.2.5 Visualising NetKDE

Show code
tmap_mode('view')
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(cg_lixels)+
  tm_lines(col="o_density")+
tm_shape(cg_origin)+
  tm_dots(alpha=0.2)
Show code
tmap_mode('plot')

This tmap plot further reinforces our observation above that the grab ride traffic are from incoming tourists or locals returning home form the airport, as you can see the denser area being the Changi Airport Terminals. However, it is worth highlighting that there some slight traffic along the Changi Village area and infront of the Japanese School as well.

Show code
tmap_mode('view')
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(tp_lixels)+
  tm_lines(col="o_density")+
tm_shape(tp_origin)+
  tm_dots(alpha=0.2)
Show code
tmap_mode('plot')

As we have discovered earlier, a huge portion of the grab rides indeed originated from Tampines East, one of the more populated area of Tampines. Particularly along Tampines Avenue 2, there seems to be a higher density, presumably due to it being more convenient to get a ride along the main road.

Surprisingly, the other higher density area in this network density map is the area around Changi General Hospital.

Show code
tmap_mode('view')
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(wl_lixels)+
  tm_lines(col="o_density")+
tm_shape(wl_origin)+
  tm_dots(alpha=0.2)
Show code
tmap_mode('plot')

There are 3 main points of to focus on with higher density, mainly:

  • Along the route to Woodlands Checkpoint, showing that a significant portion of the rides in Woodlands are people coming in from Malaysia.

  • Around the main hub of Woodlands, along the Woodlands MRT stretch. No surprises here, as the area is perhaps the most dense in terms of human traffic due to concentration of malls, bus interchange, and MRT station.

  • 3 different points around the Sembawang Air Base, which I assume is the entrance. This make sense as well, as military bases in Singapore are generally more inaccessible.

Show code
tmap_mode('view')
tm_basemap(leaflet::providers$OpenStreetMap) +
tm_shape(nv_lixels)+
  tm_lines(col="o_density")+
tm_shape(nv_origin)+
  tm_dots(alpha=0.2)
Show code
tmap_mode('plot')

Network KDE indicates that the majority of the traffic is along Moulmein Road, which is the main road next to several of the moderately wealthier estates in Singapore.

5.3 Network Constrained G- and K-Function Analysis

We are now going to perform complete spatial randomness (CSR) test by using kfunctions() of spNetwork package. The null hypothesis is defined as:

  • The observed spatial point events (i.e distribution of Grab ride hailing points) are uniformly distributed over a street network in the 4 Planning Area specified above.

The CSR test is based on the assumption of the binomial point process which implies the hypothesis that the childcare centres are randomly and independently distributed over the street network.

If this hypothesis is rejected, we may infer that the distribution of Grab ride hailing points are spatially interacting and dependent on each other; as a result, they may form nonrandom patterns.

Show code
kfun_cg <- kfunctions(cg_roads, 
                             cg_origin[c("trj_id","PLN_AREA_N", "geometry")],
                             start = 0,
                             # A double, the start value for evaluating the k and                                  g functions.
                             end = 1000, 
                             #  A double, the last value for evaluating the k                                 and g functions.
                             step = 50, 
                             # A double, the jump between two evaluations of the                               k and g function
                             width = 50,
                             # The width of each donut for the g-function
                             nsim = 50,
                             # number of Monte Carlo simulations required.
                             resolution = 50,
                             verbose = FALSE,
                             agg = 5,
                             conf_int = 0.05
                             #  A double indicating the width confidence interval                               (default = 0.05).
                             )

kfun_cg
$plotk


$plotg


$values
       obs_k    lower_k    upper_k    obs_g    lower_g    upper_g distances
1    0.00000   0.000000   0.000000 14.46260  0.5516157  0.8107405         0
2   32.75743   2.397099   2.875543 39.67653  2.7727129  3.3586704        50
3   73.63982   5.408058   6.466411 37.85016  3.2183762  4.0689220       100
4  112.49292   9.140587  10.717579 42.18582  3.9069690  4.7748808       150
5  155.30314  13.474877  15.924465 44.29316  4.5286343  5.4954740       200
6  201.05973  18.323515  21.686283 46.36538  5.0203469  6.3887519       250
7  249.41538  23.628353  28.272305 48.76931  5.4541077  6.7651462       300
8  298.82079  29.415537  34.935597 49.48736  5.9797719  7.3893481       350
9  349.27987  35.906339  42.264261 51.11470  6.5138264  8.0201842       400
10 398.82577  42.984269  50.123663 51.05616  6.7877806  8.5522875       450
11 452.44586  50.432935  58.572925 53.86205  7.2578345  9.0843908       500
12 503.59568  57.774478  67.476826 51.20446  7.8849632  9.6362016       550
13 557.14943  66.338473  77.065027 52.96838  8.6410689 10.0014739       600
14 610.77343  75.284911  87.209917 54.93132  8.9575602 10.8094826       650
15 665.84134  84.570474  97.699592 55.50109  9.5907377 11.3683179       700
16 720.21461  94.462679 109.192789 53.62790 10.1043042 12.1374969       750
17 772.87859 104.408152 121.010673 50.75567 10.4178686 12.8471631       800
18 824.82842 114.997730 133.466222 51.92641 10.7778725 13.0436531       850
19 876.22800 126.127606 147.005685 51.00933 11.2736827 13.7718561       900
20 926.42952 138.127736 161.250522 48.61321 11.6245158 14.4294241       950
21 973.50125 150.496845 175.110379 46.47075 12.5480380 14.8136234      1000

The blue line represents the empirical network K-function of the Grab ride hailing origin points in Changi planning area. The gray envelop represents the results of the 50 simulations in the interval 2.5% - 97.5%. Because the blue line is above the gray area, we can infer that these origin points in Changi planning area are in clusters, which reinforces our observations made above.

Show code
kfun_tp <- kfunctions(tp_roads, 
                             tp_origin[c("trj_id","PLN_AREA_N", "geometry")],
                             start = 0,
                             # A double, the start value for evaluating the k and                                  g functions.
                             end = 1000, 
                             #  A double, the last value for evaluating the k                                 and g functions.
                             step = 50, 
                             # A double, the jump between two evaluations of the                               k and g function
                             width = 50,
                             # The width of each donut for the g-function
                             nsim = 50,
                             # number of Monte Carlo simulations required.
                             resolution = 50,
                             verbose = FALSE,
                             agg = 10,
                             conf_int = 0.05
                             #  A double indicating the width confidence interval                               (default = 0.05).
                             )

kfun_tp
$plotk


$plotg


$values
        obs_k    lower_k    upper_k     obs_g    lower_g    upper_g distances
1    0.000000   0.000000   0.000000  3.063765  0.4252971  0.6246597         0
2    9.342307   1.395974   1.747508 13.672235  1.6972680  2.0154059        50
3   24.414290   3.393955   3.926847 15.902540  2.1632217  2.6895795       100
4   41.077689   5.877783   6.698611 16.683727  2.6040554  3.2950725       150
5   57.738184   8.949099  10.237187 16.776657  3.0983235  3.8525035       200
6   75.147922  12.289475  14.267128 18.022491  3.6689680  4.6249790       250
7   93.338848  16.413216  18.972985 18.347744  4.3232489  5.3652196       300
8  111.825986  21.099615  24.473823 19.419336  5.0627635  6.0016406       350
9  132.000374  26.687575  30.803475 20.456079  5.8061985  7.0565342       400
10 152.970469  32.948982  38.474795 21.155954  6.5278532  7.9718886       450
11 173.673392  39.980977  46.807802 21.321484  7.4815410  8.8539917       500
12 195.654094  48.127544  56.024073 22.785122  8.2920594  9.9342957       550
13 219.049065  57.010286  66.276652 23.243961  9.0238782 10.9897702       600
14 243.294920  66.749139  77.836340 24.861513 10.0513287 12.1715705       650
15 268.870827  77.282975  90.201030 26.275782 10.8772385 13.4018684       700
16 295.985877  88.803459 104.460026 27.748132 11.9221132 14.1378981       750
17 324.674919 101.480769 119.305059 29.891315 12.9081810 14.8660869       800
18 355.382270 115.071545 134.682983 32.104196 13.6684595 16.1183102       850
19 388.200861 129.395011 151.370196 32.632732 14.3912758 17.3966700       900
20 421.193694 144.538289 168.833804 34.090562 15.6930131 18.5902317       950
21 456.100292 160.977641 187.707615 35.591952 16.5844091 19.7898919      1000

Similar to Changi planning area, as the blue line is above the grey area, we can infer that the Tampines planning area consists of mainly origin points in clusters.

Show code
kfun_wl <- kfunctions(wl_roads, 
                             wl_origin[c("trj_id","PLN_AREA_N", "geometry")],
                             start = 0,
                             # A double, the start value for evaluating the k and                                  g functions.
                             end = 1000, 
                             #  A double, the last value for evaluating the k                                 and g functions.
                             step = 50, 
                             # A double, the jump between two evaluations of the                               k and g function
                             width = 50,
                             # The width of each donut for the g-function
                             nsim = 50,
                             # number of Monte Carlo simulations required.
                             resolution = 50,
                             verbose = FALSE,
                             agg = 5,
                             conf_int = 0.05
                             #  A double indicating the width confidence interval                               (default = 0.05).
                             )

kfun_wl
$plotk


$plotg


$values
        obs_k    lower_k    upper_k    obs_g    lower_g   upper_g distances
1     0.00000   0.000000   0.000000 12.79035  0.9533679  1.352140         0
2    30.72008   3.123682   4.016162 36.94501  3.9656899  5.000775        50
3    69.60787   7.631550   9.515653 41.21513  5.0057822  6.551600       100
4   111.78038  13.519998  16.776270 42.55706  6.1307964  7.727287       150
5   154.50168  20.529455  25.162503 42.34075  7.7605347  9.237253       200
6   198.64501  29.001010  34.958358 46.01402  9.0423739 10.918265       250
7   245.62842  38.954291  46.616284 46.06609 10.8111116 12.788549       300
8   291.74659  50.891618  60.112849 48.20917 12.3479166 14.774999       350
9   340.81299  64.270014  76.006453 49.31876 14.4002613 17.087517       400
10  391.33748  79.941699  93.993260 52.93195 16.3578700 19.902155       450
11  445.41507  97.808734 115.618287 53.94941 18.3883834 22.144172       500
12  501.55562 117.921792 138.632107 58.50394 20.9144077 24.916950       550
13  560.44011 139.800584 165.302573 58.34371 23.3707319 28.416171       600
14  619.75722 163.948631 194.764845 61.11969 25.5252232 31.061767       650
15  680.92097 190.813977 226.671825 62.37750 27.9781426 33.145156       700
16  745.74999 219.338704 261.039535 66.83590 30.7879741 36.486951       750
17  814.30034 251.274124 298.849585 71.84709 32.6973136 39.164793       800
18  890.13315 286.617033 339.844403 78.82910 36.4999696 43.218609       850
19  971.08530 324.414465 383.365847 82.05774 38.2004093 45.601628       900
20 1054.44490 364.342953 429.697723 86.54017 40.5986503 48.330944       950
21 1146.03231 407.078470 478.914739 95.30875 43.2594680 50.332616      1000

Similar to Changi planning area, as the blue line is above the grey area, we can infer that the Woodlands planning area consists of mainly origin points in clusters.

Show code
kfun_nv <- kfunctions(nv_roads, 
                             nv_origin[c("trj_id","PLN_AREA_N", "geometry")],
                             start = 0,
                             # A double, the start value for evaluating the k and                                  g functions.
                             end = 1000, 
                             #  A double, the last value for evaluating the k                                 and g functions.
                             step = 50, 
                             # A double, the jump between two evaluations of the                               k and g function
                             width = 50,
                             # The width of each donut for the g-function
                             nsim = 50,
                             # number of Monte Carlo simulations required.
                             resolution = 50,
                             verbose = FALSE,
                             agg = 5,
                             conf_int = 0.05
                             #  A double indicating the width confidence interval                               (default = 0.05).
                             )

kfun_nv
$plotk


$plotg


$values
        obs_k    lower_k    upper_k    obs_g   lower_g   upper_g distances
1    0.000000  0.0000000  0.0000000 1.891585 0.0727873 0.1460171         0
2    3.964585  0.2654856  0.3716798 4.179186 0.3284278 0.4728962        50
3    8.256602  0.6657052  0.8873856 4.486707 0.4080735 0.6464575       100
4   12.528708  1.1348404  1.5996614 4.053080 0.5009935 0.7910365       150
5   16.840637  1.7497714  2.4107199 4.433610 0.6691343 0.9547526       200
6   21.367167  2.4868258  3.4915782 4.924758 0.7905940 1.1951277       250
7   26.568472  3.4002069  4.7764179 5.110598 0.9321863 1.4483346       300
8   31.769778  4.4104903  6.3386901 5.546437 1.1191325 1.6916964       350
9   37.727717  5.7749757  8.0964261 6.449088 1.3161449 1.9736643       400
10  44.433441  7.2847038 10.0214180 6.807493 1.5379360 2.2082872       450
11  51.333855  8.9061572 12.3385542 6.869440 1.7227804 2.4496578       500
12  58.046216 10.8280517 14.8793621 6.413690 1.8392622 2.7996563       550
13  64.479817 12.7920921 17.8482655 6.834042 2.1097256 2.9377089       600
14  71.767397 15.1595600 20.9082084 7.670322 2.3241053 3.2169112       650
15  79.572674 17.6524699 24.2924861 7.814126 2.5285292 3.4871534       700
16  87.760692 20.2134104 27.9338425 8.767662 2.6869356 3.6581704       750
17  96.636761 22.9558767 31.6077208 9.196864 2.7926873 3.9455586       800
18 105.981854 26.0478991 35.6042748 8.993325 2.9986599 4.2550706       850
19 115.021639 29.2103858 39.9798094 9.289784 3.2698977 4.3002031       900
20 124.893279 32.4049520 44.3921802 9.953498 3.2818446 4.6781883       950
21 134.570229 35.8630126 49.2246820 9.840667 3.4491005 4.5292951      1000

Similar to Changi planning area, as the blue line is above the grey area, we can infer that the Novena planning area consists of mainly origin points in clusters.

The results of our G- and K-Function Analysis on all four planning area shows a spatial pattern of clustering among the grab origin points, which supports the idea that grab rides are commonly booked at the same location within an area, possibly due to designated pickup points or taxi stands.

6. Conclusion

In conclusion, our analysis of Grab ride-hailing origin points in the specific planning areas of Changi, Tampines, Woodlands, and Novena, and also the whole of Singapore uncovered noteworthy spatial patterns. The observed clustering of origin points within these areas suggests a localized preference for specific pickup locations, potentially driven by factors such as designated pickup points, popular landmarks, transportation hubs, or simply area with higher population density.

These findings hold practical implications for both Grab and urban planners as the identified clusters can guide Grab in optimizing their service by strategically placing vehicles or promoting the use of specific pickup points, ultimately enhancing the efficiency and user experience. Urban planners, on the other hand, can leverage this information to make informed decisions regarding infrastructure development, such as improving the accessibility of popular pickup locations or adjusting traffic flow in areas with high ride-hailing activity.

Moreover, understanding the spatial dynamics of Grab ride-hailing services contributes to a broader perspective on urban mobility patterns. This knowledge can be valuable for city officials, transportation authorities, and policymakers in crafting policies that support sustainable and efficient transportation solutions. By aligning urban planning efforts with the observed ride-hailing patterns, cities can work towards creating more resilient, user-friendly, and accessible transportation systems.

In essence, our analysis not only sheds light on the localized behaviors of Grab users but also opens avenues for strategic decision-making that can enhance the overall urban mobility landscape. As technology continues to shape the future of transportation, such spatial insights play a crucial role in fostering innovation and creating urban environments that are responsive to the evolving needs of their residents.